APOC1 as a novel diagnostic biomarker for DN based on machine learning algorithms and experiment

Introduction Diabetic nephropathy is the leading cause of end-stage renal disease, which imposes a huge economic burden on individuals and society, but effective and reliable diagnostic markers are still not available. Methods Differentially expressed genes (DEGs) were characterized and functional enrichment analysis was performed in DN patients. Meanwhile, a weighted gene co-expression network (WGCNA) was also constructed. For further, algorithms Lasso and SVM-RFE were applied to screening the DN core secreted genes. Lastly, WB, IHC, IF, and Elias experiments were applied to demonstrate the hub gene expression in DN, and the research results were confirmed in mouse models and clinical specimens. Results 17 hub secretion genes were identified in this research by analyzing the DEGs, the important module genes in WGCNA, and the secretion genes. 6 hub secretory genes (APOC1, CCL21, INHBA, RNASE6, TGFBI, VEGFC) were obtained by Lasso and SVM-RFE algorithms. APOC1 was discovered to exhibit elevated expression in renal tissue of a DN mouse model, and APOC1 is probably a core secretory gene in DN. Clinical data demonstrate that APOC1 expression is associated significantly with proteinuria and GFR in DN patients. APOC1 expression in the serum of DN patients was 1.358±0.1292μg/ml, compared to 0.3683±0.08119μg/ml in the healthy population. APOC1 was significantly elevated in the sera of DN patients and the difference was statistical significant (P > 0.001). The ROC curve of APOC1 in DN gave an AUC = 92.5%, sensitivity = 95%, and specificity = 97% (P < 0.001). Conclusions Our research indicates that APOC1 might be a novel diagnostic biomarker for diabetic nephropathy for the first time and suggest that APOC1 may be available as a candidate intervention target for DN.


Introduction
Diabetic nephropathy (DN) is one of the most serious complications of diabetes and 45% of DN patients will progress to end-stage renal disease (ESRD) (1), which affects the quality of life and causes a substantial economic burden to society (2). The gold standard for diabetic kidney diagnosis remains renal pathology, but renal puncture biopsy methods are invasive for DN patients. In recent years, some biological signatures have been detected for the diagnosis of DN, such as KIM-1, NGAL, suPAR, YKL-40, and so on (3)(4)(5). However, there are no valid and reliable biological markers for the diagnosis of DN.
GEO Database is a database established by the National Centre or Biotechnology Information (NCBI) to determine the critical genes and underlying molecular mechanisms for disease pathogenesis and progression (6). Recently, bioinformatics and machine learning methods extensively employed in biomarker screening by using the GEO database (7)(8)(9). What's more, secreted proteins have significance in course of biological activity, specifically in the diagnosis of diseases and future target therapies (10,11). This provides the opportunity to detect novel plasma markers for the recognition of patients with DN.
The research aims to reveal potential predictor plasma biomarkers of DN by data mining, which will generate novel insights into the mechanisms of DN pathogenesis and provide directions for future research into alternative therapies. If the potential predictor plasma biomarkers accurately predict the probability of DN occurring, the disease may be treated with prevention and intervention at an early stage.

DEGs data processing
Expression profiles of GSE96804 mRNA were obtained from the GEO database (GPL17586 platform, Affymetrix Human Transcriptome Array 2.0) (12). In total, 61 tissue biopsies, 41 tissue samples from DN tissue samples and 20 from the normal, were obtained from the National Clinical Research Center of Kidney Diseases, Jinling Hospital, Nanjing University School of Medicine. Limma " packaged (13) in R software was used to process data and the "ggplot2" (14), "Pheatmap" packages for drawing of figures. DEGs were identified with |log Fold Change | ≥1 & adj P Val < 0.05.

GO and KEGG enrichment analysis
GO analysis was conducted using the 'cluster Profiler' (15), 'GO plot', and 'ggplot2' packages for up-and down-regulated DEGs with altered DN and normal kidney tissue. The KEGG pathway enrichment analysis was completed by DEGs, and the figures were generated with the packages "ggplot2" and "enrich plot".

WGCNA network construction and data analysis
Gene co-expression networks of DN patients were constructed based on the GSE96804 microarray dataset by the "WGCNA" package (16). The soft-thresholding power was five when 0.9 was used as the correlation coefficient threshold, and 50 was chosen as the minimum number of genes in modules. To merge possible similar modules, we defined 0.25 as the threshold for cutting height. A heatmap between the correlation between modules and DN was drawn, and the ME-brown gene module was the most related to DN.

Secreted genes download
729 secreted genes are available for the HPA database (https:// www.proteinatlas.org). Venn diagram (https://bioinfogp.cnb.csic.es/ tools/venny/index.html) demonstrates the genes which are commonly associated with the 3 datasets (DEGs, WCANA, and secreted to blood genes). In the common genes, we further filtered the core secretory genes by using different machine algorithms (Lasso and SVM-RFE algorithm).

Lasso algorithm and SVM-RFE algorithm data analysis
Lasso logistic regression is a machine learning process that determines covariates by seeking the l value that minimizes the classification error (17). The "glmnet" package was utilized to structure the LASSO model. Meanwhile, With SVM-RFE, an approach for building machine training on support vector machines, we detect the optimal variables by decimating the feature vectors created by svm (18). Recursive features of differential genes were acquired and erased by running the "e1071 package", and the research was conducted by applying the Lapply function to sort all the features of the training set. Ultimately, the error rate is minimized and the hub gene is eventually obtained.

Presentation of hub genes
The common genes derived from these two machine algorithms are demonstrated by the Venn diagram, heat maps, line plots, and deviation plots.

Biomarker expression validation and clinical relevance
As illustrated in our previous research, the expression of biomarkers was confirmed by using the Nephroseq database (https://www.nephroseq.org/resource/main.html) (19). Meanwhile, by using the database, biomarker expression and renal function data were analyzed for correlation.

Animal experiments
The STZ-induced DN mouse model was elucidated in detail in our previous research (19), and among them, there were 5 mice in the control group (Ctrl) and 5 mice in the diabetic nephropathy group (DN). Following the successful construction of the DN mouse model, we conduct the collection of experimental animals. The research was approved by the Ethics Committee of Qilu Hospital, Shandong University (Approval No: KYLL-2020 (KS)-030).

ELISA experiment
We have collected serum specimens from DN patients and healthy. Detection of biomarkers in serum with commercial Elisa kits, ELISA method, in DN patients and healthy. Follow the experimental steps in the Elisa kit instructions to detect the expression level of the marker in the serum (Apolipoprotein CI ELISA kit, Abcam, ab108808, USA).

ROC
The "PROC" package was used to construct Receiver Operating Characteristic (ROC) curves to characterise hub gene to evaluate the diagnostic value of DN, as previously described (19).

Statistical analyses
Data are expressed as mean ± SEM. Software R4.1.2 was used to draw the research Figures. GraphPad Prism 6.01 software was used in statistical data analysis. Between the two groups, Student's t-test was used if the data matched the normal distribution, and the Kruskal-Wallis test was used for non-normally distributed data. For statistical analysis of the correlation between the two characters, the Spearman test was applied. Statistical significance was set at P < 0.05, *P < 0.05, **P < 0.01, ***P < 0.001.

Characterisation of genes for DN using GSE96804 microarray data
The experimental design was illustrated in Figure 1. Compared to transcripts of controls, 504 DEGs were identified by patients, respectively. Our analysis of the results is summarized in the volcano plots, which reflect that 257 genes are up-regulated in DN and 247 genes are down-regulated in DN ( Figure 2A). In the illustration, red represents up-regulated and green indicates down-regulated genes. Results demonstrated two clusterings of this data, namely the clusters Control and DN which represents the control group and the DN patients in the heatmap ( Figure 2B). Analysis of GO in DEGs determined shared GO terms linked to organic acid catabolic processes, and extracellular matrix organization ( Figure 2C). Enrichment pathways to KEGG are FIGURE 1 Research flow chart.

Hub gene screening for DN by WGCNA
The network topologies for the analysis of various soft threshold powers were identified and the choice of 11 to structure the joint expression network was considered reasonable ( Figure 3A). The similarity in gene expression is ascertained by pair-weighting correlation metrics, and clustering is performed using topological overlapping metrics. Gene modules are marked with color at the bottom ( Figure 3B). Pearson correlation coefficients for ME and disease were calculated for all modules demonstrating the intimate characteristics of the modules with DN. ME-brown (R = 0.53, P = 1e-05) potentially represented particular features of DN patients ( Figure 3C). Furthermore, we observed that the correlation coefficient between the GS of DN and the module members was high in brown modules (R = 0.47, = 3.1e-21, Figure 3D). There was potential biological relevance to heightened co-expression of the genes in the ME-brown module.

Screening of hub secretory genes for DN by machine algorithms
Venn diagram illustrating common genes across algorithms, filtering for 17 potential secretory genes that may be functionally essential in DN ( Figure 4A). By using 2 machine algorithms, Lasso and SVM-RFE, to recognize the characteristic genes of DN. The 17 secreted genes are displayed in Figure 4B. By using 2 machine algorithms, LASSO and SVM-RFE, characteristic genes of the DN were identified again. The Lasso algorithm filtered out 9 potential hub genes ( Figures 4C, D), while the SVM algorithm filtered out 7 potential hub genes ( Figures 4E, F).

Expression of 6 secretory genes in DN
The Venn diagram illustrates 2 machine algorithms obtained common 6 hub secretory genes (APOC1, CCL21, INHBA, RNASE6, TGFBI, VEGFC, Figure 5A). Furthermore, the expression of the six genes in the GSE96804 cohort is illustrated by heatmap, line graphs, and deviation plots (Figures 5B-D). The results revealed that 6 secretory gene generators screened for the research were significantly more over-expressed in the diabetic nephropathy population.

Associated expression of APOC1 in DN
APOC1 expression is elevated in patients with diabetic nephropathy through multiple cohorts of the experimental GEO database (GSE96804, GSE47185, GSE30122, and the ERCB Nephrotic Syndrome Tublnt cohorts in Nephroseq database, Figures 6A-D). ApoC1 expression was elevated in the kidney tissue of mice with DN by Western blot (P < 0.05, Figure 6E). What's more, we revealed that APOC1 was expressed predominantly in the glomerulus by immunohistochemistry of mouse kidney tissue Frontiers in Endocrinology frontiersin.org ( Figure 6F). These measurements were confirmed by tissue immunofluorescence ( Figure 6G).

Plasma expression of APOC1 in DN patients and ROC curve analysis
Altogether 20 healthy and 20 DN patients were enrolled in the research, and the Baseline details were presented in Table 1. Significantly, Elisa results demonstrated that APOC1 expression in the serum of DN patients was 1.358±0.1292mg/ml, compared to 0.3683±0.08119mg/ml in the healthy population ( Figure 8A). APOC1 was significantly elevated in the sera of DN patients and the difference was statistical significant (P > 0.001). Furthermore, APOC1 diagnostic effectiveness for DN as demonstrated by ROC curves (AUC = 92.5%, sensitivity = 95%, and specificity = 97%, P < 0.001, Figure 8B).

Discussion
DN is considered to the most serious complication of diabetes and imposes a substantial financial burden on individuals and society (22). It is vital to diagnose DN early to improve the prognosis of patients with DN and reduce the financial burden (23). However, the most dominant clinical indicators for the diagnosis of DN are still UACR and eGFR, in clinical practice (24). Previous studies have demonstrated that damage to the kidney, such as endothelial damage, tubulointerstitial dilatation, and interstitial fibrosis, has already occurred before the appearance of albuminuria in patients with DN (25). The abnormalities in molecular markers usually precede the clinical symptoms of the disease (26,27). Therefore, the urgent challenge is to identify suitable, stable, and easily detectable biomarkers for DN diagnosis.
Microarrays have been extensively implemented in medical research, such as biomarkers for disease diagnosis, and prognosis (28,29). Consequently, we investigated the differential genes in the kidney tissue of diabetic nephropathy and healthy people by microarray transcriptome analysis (Figure 2). The research demonstrated that 257 up-regulated and 247 down-regulated genes were compared to normal kidney tissue. Furthermore, we also screened for gene modules closely correlated with diabetic nephropathy by the WGCNA method ( Figure 3). 17 secretory genes were obtained in the differential and Me-Brown modules (Figure 3, 4), which may have an essential role in DN.
Our investigation further screened for core secretory genes in diabetic nephropathy using the Lasso and SVM-RFE machine learning algorithms, which identified a total of six potential core genes (Figures 4, 5). Among the six secreted genes, APOC1 is newly identified as a member of the lipoprotein family and is closely associated with lipid metabolism and immune inflammation. Our research demonstrated elevated expression of APOC1 in DN. Additionally, APOC1 expression was also confirmed by other  transcriptome microarray data ( Figure 6). Lipid metabolism disorders and immunoinflammatory responses are critical in the development and progression of DN patients (30,31), which means that APOC1 may be also involved in the development of DN. APOC1 has been implicated in the progress of many diseases such as malignancy (32), atherosclerosis (33), and Alzheimer's disease (34). More importantly, APOC1 is closely associated with cell proliferation, apoptosis, and immune inflammation (35). Recent research also has identified ApoC1 which promotes renal clear cell carcinoma metastasis through activation of the STAT3 pathway (36) and is a potential novel diagnostic and prognostic marker for clear cell renal carcinoma (37). Animal experiments are employed to confirm the results of research. In vivo, we also demonstrated that APOC1 expression was significantly increased in diabetic nephropathy kidney tissues, mainly in the glomerulus, using a mouse model of diabetic nephropathy ( Figure 6). Currently, our team are also conducting functional and mechanistic research on the role of APOC1 in DN.
Interestingly, we also conducted a correlation analysis between APOC1 and clinical data. we investigated the correlation of APOC1 expression with urinary protein and eGFR in DN patients through the Nephroseq database (Figure 7). For further evidence, we collected blood samples from 20 patients with DN and 20 healthy. We assayed the expression level of APOC1 in serum by Elisa assay. The outcome showed that APOC1 expression was significantly higher in DN patients and had an excellent diagnostic efficacy for DN ( Figure 8). Therefore, we concluded that APOC1 may be a novel biomarker for DN. Nevertheless, many deficiencies remain for our research. The role and mechanism of APOC1 in the development of DN is still unclear. The diagnostic efficacy of APOC1 for DN still needs to be demonstrated in multicentre research. Additionally, APOC1 expression and the prognosis of DN patients still need more prospective investigation.
In conclusion, elevated glomerular and serum expression of APOC1 in DN was identified for the first time through bioinformatics, machine learning, animal model experiments, and clinical data. APOC1 was demonstrated to be a novel and potential biological diagnostic marker for DN, but additional prospective research remains needed to demonstrate its diagnostic value.

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

Ethics statement
The studies involving human participants were reviewed and approved by the Ethics Committee of Qilu Hospital, Shandong University (Approval No: KYLL-2020(KS)-030). The patients/ participants provided their written informed consent to participate in this study. The animal study was reviewed and approved by the Ethics Committee of Qilu Hospital, Shandong University

Acknowledgments
Appreciation to all who participated in the article.

Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher's note All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.